Acceleration of Covariance Models for Non-coding RNA Search
نویسنده
چکیده
Stochastic context-free grammar (SCFG) based models for non-coding RNA (ncRNA) gene searches are much more powerful than regular grammar based models due to the ability to model intermolecular base pairing. The SCFG models (also known as covariance models) can be scored exactly using dynamic programming techniques. However, the computational resources needed to compute optimal scores using dynamic programming is too great for most applications. Pre-filtering of the database using regular grammar based models can lead to significant improvements in performance at little or no cost in terms of specificity or sensitivity. While pre-filtering is a major improvement, the algorithm is still way to slow. The use of an alternative search strategy for high scoring subsequences in the sequence database is explored in this paper. Rather than sequentially computing the best score at each database position and subsequence length as is done in the dynamic programming method, good suboptimal scores are found throughout the position and length search space and the search is expanded about these trial solutions.
منابع مشابه
Joint Loop End Modeling Improves Covariance Model Based Non-coding RNA Gene Search
The effect of more detailed modeling of the interface between stem and loop in non-coding RNA hairpin structures on efficacy of covariance-model-based non-coding RNA gene search is examined. Currently, the prior probabilities of the two stem nucleotides and two loop-end nucleotides at the interface are treated the same as any other stem and loop nucleotides respectively. Laboratory thermodynami...
متن کاملCMCompare webserver: comparing RNA families via covariance models
A standard method for the identification of novel non-coding RNAs is homology search by covariance models. Covariance models are constructed for specific RNA families with common sequence and structure (e.g. transfer RNAs). Currently, there are models for 2208 families available from Rfam. Before being included into a database, a proposed family should be tested for specificity (finding only tr...
متن کاملRfam: an RNA family database
Rfam is a collection of multiple sequence alignments and covariance models representing non-coding RNA families. Rfam is available on the web in the UK at http://www.sanger.ac.uk/Software/Rfam/ and in the US at http://rfam.wustl.edu/. These websites allow the user to search a query sequence against a library of covariance models, and view multiple sequence alignments and family annotation. The ...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملEfficient non-coding RNA gene searches through classical and evolutionary methods
Successful non-coding RNA gene searching requires examination of long-range intramolecular base pairing possibilities. This results in search algorithms with extremely long run times such that large-scale use of the algorithms often becomes computationally infeasible. Methods for the efficient search of the solution space are examined. A review of the standard dynamic-programming covariance mod...
متن کامل